% Chapter 1
\chapter{Motivation}
\label{Chapter1}
\lhead{Chapter 1. \emph{Motivation}}
This section outlines the reasons for undertaking this project and the problems that it aims to solve. 
\section{The Problem}
\begin{quote}
\textit{`Newspapers are dead but it will take a while for the body to cool down.' \footnote{ Quote by Amanda Kiss, Media Correspondent for the Guardian.}} 
\end{quote}
The `old media' industry, of which newspapers are an integral part, has been in steady decline for the past twenty years. The sharp fall in advertising revenue has been compounded by the changing viewing patterns of a generation brought up on instant and freely available information on the internet. 

This has had a crippling effect on small and local newspapers in particular, with the collapse of the traditional business model resulting in the closure of over sixty UK local newspapers in the past year alone\footnote{ Mourning the death of local newspapers, URL: \url{http://www.guardian.co.uk/commentisfree/2009/apr/26/local-newspapers} [Last Accessed 13/05/09].}.

This does not represent lack of desire for news content, simply a change in the way it is consumed: the popularity of on-line news services has grown exponentially over the last few years, and with the rise of smartphones\footnote{ Mobile phones capable offering e-mail, internet and other advanced features.} and the 3G network\footnote{ Third Generation of mobile networking technology offering high-speed internet access.}, they are now capable of offering ubiquitous access to their content.

The internet has also provided the perfect platform for participatory journalism, empowering individuals to become involved in the collection, reporting and dissemination of news. This has significantly increased the amount of information available about any local area.

The problem is that this information is dispersed over such a wide variety of web sites, it is difficult for a casual user to be aware of it all. Although services currently exist that collect online news and attempt to automatically organise the content by location, they fail to deliver this information categorised by topic. Even though the news is now mostly consumed online rather than in print, users still expect to see articles organised at the very least into the various sections they would find in their local newspaper. 

An online aggregation service which offered a broader range of content and a more valuable experience to users could be highly attractive to local publishers who must broaden their audience in order to address the change in their traditional business model.

\section{Currently Available Services}
Online services that currently offer local news do so with various levels of automation and cover differing ranges of geographical locations. As can be seen in Figure~\ref{fig:alternatives}, only web sites which automate the process can cover a large number of local areas.

\begin{figure}[htbp]
\begin{center}
\includegraphics[width=5in]{./Figures/alternatives.pdf}
\end{center}
\caption[Current Online Local News Services]{Current online local news services organised by area of coverage and level of automation.}
\label{fig:alternatives}
\end{figure}

\clearpage

These services can be divided into four distinct categories:

\subsection{Specialised Local News Web Sites}
Web sites such as these are usually the online version of a local print newspaper. An example of such a site would be thisisbristol.co.uk from the Bristol Evening Post. They are manually edited sites providing news articles written by professional journalists. The content is organised into the various categories that would be found in the printed version of the paper.

Although providing thorough and well-edited content, these web sites are only available for locations where there is a sufficiently large target audience for it to be financially viable. They also focus almost solely on their own original content, limiting the scope of information provided, although many encourage user contributions as a way of creating communities around their sites. 

\subsection{Manually Edited Portals}
These are usually run by major media organisations, such as the BBC\footnote{ BBC Online, URL: \url{http://www.bbc.co.uk/} [Last Accessed 17/05/09].}, which employ a large number of journalists and editorial staff across the country. This enables them to maintain and update individual web sites for each region they cover. This is done in similar fashion to the specialised web sites mentioned above, with almost all content professionally written by their own journalists.

They therefore share the same drawbacks, suffering from limited scalability. Each web site represents an entire region, rather any smaller geographical area.
\subsection{`Hyperlocal' Information Services}
A relatively new concept, 'hyperlocal' web sites provide packages of information gathered from news sources and government data for local areas. These services can be tailored to a particular neighbourhood or even an individual street. Examples of such sites include Outside.in\footnote{ Outside.in, URL: \url{http://outside.in/} [Last Accessed 17/05/09].} and PlaceBlogger\footnote{ PlaceBlogger, URL: \url{http://placeblogger.com/} [Last Accessed 17/05/09].}.

Although this information is collected and displayed automatically, it requires specific tailoring for each location to be represented. This means that these services are generally only available for certain cities. 

\subsection{News Aggregators}
News aggregation services collect links to articles from numerous other sources which are then organised and displayed in some pre-determined fashion, usually by topic or importance. This process can be done manually, such as the influential Huffington Post \footnote{ Huffington Post, URL: \url{http://www.huffingtonpost.com/} [Last Accessed 17/05/09].} or Drudge Report \footnote{ Drudge Report, URL: \url{http://www.drudgereport.com/} [Last Accessed 17/05/09].} web sites, or automatically, such as Topix\footnote{ Topix, URL: \url{http://www.topix.com/} [Last Accessed 17/05/09].} or Google News\footnote{Google News, URL: \url{http://news.google.com} [Last Accessed 17/05/09].}.

\textbf{Google News} is one of the largest news aggregation services, collecting articles from over 4,500 sources\footnote{ About Google news, URL: \url{http://news.google.com/intl/en_us/about_google_news.html} [Last Accessed 13/05/09].}. It ranks and lists news articles according to their `importance' as decided by their StoryRank algorithm (similar to PageRank \cite{pagerank}). It also performs text categorisation on them, allowing users to browse the all news about a certain topic.

In 2008, Google released a local news service which takes the name of any location in the world and returns a list of articles believed to be relevant to it. This location analysis is however relatively naive, simply returning any articles in which the name of the location, or any others neighbouring it, are present. It also fails to provide any structure to the results, neither identifying article topics, nor arranging them into categories.

\section{Proposed Solution}
As discussed in the previous section, there are no services currently available which are able to automatically use the vast quantities of news distributed over the internet to produce a categorised set of articles relevant to a particular location. Although it is difficult to know for certain, as Google remains very secretive about its technology, we have theorised two possible reasons for the lack of topic categorisation in the results:

\begin{itemize}
\item Firstly, it is possible that Google only assigns categories to articles if the engine is very certain to have classified correctly, meaning that most have no topic metadata added to them. 
\item Secondly, with some locations having very little written about them, it is also possible that pre-defined sections are not used as, in the majority of cases, there would not be a sufficient number of articles to populate them satisfactorily.
\end{itemize}

This project has built upon current news aggregation tools to improve the article acquisition and classification processes and create a fully automated local newspaper generator, tackling the two issues raised above. Its main objective has been to avoid tailoring the solution to any set of topics or locations, allowing the inclusion of additional training data to increase the number of topics or regional coverage. 

\section{Project Aims}

The aim of this project is to build a local newspaper generator capable of producing a structured set of articles about a desired location. This will involve research and implementation of techniques to:

\begin{itemize}
\item Quickly and accurately extract full article content out of a web page with minimal overhead
\item Automatically categorise articles into specific sections
\item Provide articles relevant to the desired location
\item Order articles according to their relevance to the given location
\item Automatically widen the scope of search, if not enough information is available about the desired location.
\item Ensure all methods remain generic, allowing the product to be scaled to cover any topic or geographical area.
\end{itemize}
